OpenCL Evaluation for Numerical Linear Algebra Library Development

نویسندگان

  • Peng Du
  • Piotr Luszczek
  • Jack Dongarra
چکیده

With the help of of CUDA [7], [6], many applications improved their performance by using GPUs. In our project called Matrix Algebra on GPU and Multicore Architectures (MAGMA) [10], we mainly focus on dense linear algebra routines similar to those from LAPACK [1]. Other than CUDA, there exist other frameworks that allow platformindependent programming for GPUs. The main three frameworks are: 1) DirectCompute from Mircosoft, 2) OpenGL Shading Language (GLSL), and 3) OpenCL The first one allows access to graphics cards from multiple vendors. However, it is specific to Microsoft Windows and therefore it is not portable between host Operating Systems (OS). OpenGL Shading language [8] is portable across both GPU hardware and the host OS. However, it is specifically geared towards programming new graphics effects – GLSL does not have the scientific focus. OpenCL [3] has been designed for general purpose computing on GPUs (GPGPU). It is an open standard maintained by the Khronos group with the backing of major graphics hardware vendors as well as large computer industry vendors interested in off-loading computations to GPUs. As a result, there exist working OpenCL implementations for graphical cards and, in addition, there is an implementation that works without a GPU by off-loading computations to multi-core processors. As a result, OpenCL offers portability across GPU hardware, OS software, as well as multicore processors. Therefore OpenCL is our choice of implementing a portable numerical linear algebra library.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Programming TI KeyStone II-based ARM + DSP Devices using Industry Standard Tools

Heterogeneous multicore architectures are used to meet the demanding needs of data and signal processing intensive applications such as those found in the high performance computing and defense markets. While heterogeneity is a powerful design tool for optimizing hardware for specific classes of algorithms, it comes with the challenge of how to best take advantage of the available cores in a si...

متن کامل

CLBlast: A Tuned OpenCL BLAS Library

This work demonstrates how to accelerate dense linear algebra computations using CLBlast, an open-source OpenCL BLAS library providing optimized routines for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the core of many applications (e.g. deep learning, iterative solvers, astrophysi...

متن کامل

An Automatic OpenCL Compute Kernel Generator for Basic Linear Algebra Operations

An automatic OpenCL compute kernel generator framework for linear algebra operations is presented. It allows for specifying matrix and vector operations in high-level C++ code, while the low-level details of OpenCL compute kernel generation and handling are dealt with in the background. Our approach releases users from considerable additional effort required for learning the details of programm...

متن کامل

Delayed Evaluation and Runtime Code Generation as a means to Producing High Performance Numerical Software Project Report

Attaining both performance and abstraction is a challenge often faced by software engineers. This is especially the case with mathematical software, where despite the existence of languages such as C++ which enable the usage of numerical abstractions, Fortran remains a popular language due to the high effective of available compilers. The pursuit for high performance numerical code with C++ abs...

متن کامل

Case studies on the development of ScaLAPACK and the NAG Numerical PVM Library

In this paper we look at the development of ScaLAPACK, a software library for dense and banded numerical linear algebra, and the NAG Numerical PVM Library, which includes software for dense and sparse linear algebra, quadrature, optimization and random number generation. Both libraries are aimed at distributed memory machines, including networks of workstations. The paper concentrates on the un...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010